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ABSTRACT 

Nonlinear combinations of direct observables are often used to estimate quantities 
, of theoretical interest. Without sufficient caution, this could lead to biased estimations. 

An example of great interest is the skewness S3 of the galaxy distribution, defined as 
the ratio of the third moment £3 and the variance squared £ 2 smoothed at some scale 

O— —2 
R. Suppose one is given unbiased estimators for £ 3 and £ 2 respectively, taking a ratio 

ff~j ■ of the two does not necessarily result in an unbiased estimator of 53. Exactly such an 

estimation-bias (distinguished from the galaxy-bias) affects most existing measurements 
of S3 from galaxy surveys. Furthermore, common estimators for £ 3 and £ 2 suffer also 
from this kind of estimation-bias themselves, because of a division by the estimated 
\ mean counts- in-cells. In the case of £ 2 , the bias is equivalent to what is commonly 

known as the integral constraint. We present a unifying treatment allowing all these 
estimation-biases to be calculated analytically. These estimation-biases are in general 
negative, and decrease in significance as the survey volume increases, for a given smooth- 
ing scale. We present a preliminary re-analysis of some existing measurements of the 
variance and skewness (from the APM, CfA, SSRS, IRAS) and show that most of the 
well-known systematic discrepancies between surveys with similar selection criteria, but 
different sizes, can be attributed to the volume-dependent estimation-biases. This af- 



fects the inference of the galaxy-bias(es) from these surveys. Our methodology can be 
adapted to measurements of the variance and skewness of, for instance, the transmission 
distribution in quasar spectra and the convergence distribution in weak-lensing maps. 
We discuss generalizations to N > 3, suggest methods to reduce the estimation-bias, 
and point out other examples in large scale structure studies which might suffer from 
this type of a nonlinear-estimation-bias. 
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1. Introduction 

There has been a long history of interests, since the pioneering work of Peebles ( |198C| , §18) in 
the hierarchical amplitudes Sn, defined as the following ratio: 

Sn = I n /C?~\ (1) 

where £jv is the N-cumulant defined by (6 ) c , and 5 is the density fluctuation smoothed on some 
scale. These quantities are important as a test of the gravitational instability paradigm (e.g. 
Fry 1984 |Juszkiewicz et al. 1993 ; Bernardeau 1994| ), a probe of possibly non-Gaussian initial con- 



ditions (e.g. Silk & Juszkiewicz 1991; paztahaga Sz Maehoenen 1996 ; paztahaga fc Fosalba 19*97 ) 



as well as a measure of the galaxy-bias (e.g. Fry & Gaztahaga 1993; Frieman & Gaztahaga 1994) 



However, it has also been a puzzle for quite some time that different galaxy surveys yield discordant 
values for Sn (e.g. Table [l], for N = 3, 4). While some of the differences no doubt arise from the 
fact that galaxies selected in different ways might have different galaxy-biases, not all of the differ- 
ences can be convincingly explained away in such a manner. For instance, a comparison between 
optically selected galaxy-catalogs (in Table |l|) reveal a substantial and systematic difference be- 
tween the measured values of Sn- redshift surveys consistently yield lower values compared to the 
larger angular catalogues (e.g. compare APM/LICK/EDSGC values with those from CfA/SSRS 
in Table 1; note that the IRAS galaxies are infrared-selected; note also an exception to this rule in 
the measurements by [Kim fc Strauss 1998| ). A rather large relative galaxy-bias between these two 
sets of catalogues would have to be invoked to reconcile them. 

Three alternative explanations are possible. The first is that redshift space distortions tend 
to suppress Sn, but it has been shown to be not sufficient to explain the systematic differences, 
especially on large scales ( [Fry Gaztahaga 1994 ). The second is that the local volume (sampled 



by the redshift surveys) just happens to have a smaller Sn compared to the true Sn which is 
presumably measured in the larger angular surveys i.e. the local universe is not a fair sample (e.g. 
Gaztahaga 1994| ) . This is related to the question of the homogeneity scale of our universe which has 



been a subject of some debate (see e.g. Wu et al. 1998| ). The third is to blame it on the estimator for 



Sn'- it yields a value that is on the average biased low, with the bias (distinguish this from the galaxy- 
bias) getting worse as the survey becomes smaller. We will demonstrate that the third contributes 
to a significant fraction of the systematic differences between the different measurements. While a 
thorough analysis detailing exactly how much each of these factors contribute is beyond the scope 
of this paper, we can safely conclude that inference of large sampling fluctuations, or a large relative 
galaxy-bias, based on measurements of S3, are unwarranted. 



A very closely related estimation-bias has been discussed before by Colombi et al. (1994) 
as a finite- volume effect.^] They attributed this to the abrupt cut-off of the count probability at 
some finite number of particles because of the finite size of a survey. They proposed a way to 



3 Related ideas have also been considered by Bromley 1998, private communication. 
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correct for this bias by extending the tail of the count probability using a few phenomenological 



parameters calibrated from simulations (see also Fry & Gaztahaga 1994; Munshi et al. 1997). 
The method works reasonably well for small smoothing scales, but not for large ones, mainly 
because the probability-distribution-tail becomes noisy in the latter case (Colombi 1998, private 



communication). Moreover, the method is feasible only with a dense-sampled survey (Bouchet 
|et al. 19931) . More recently, Szapudi & Colombi ( |1996|) and Colombi et al. ( |L998| ) discussed how a 
finite survey-volume affects the errors (i.e. the variance around the mean) in the estimators of the 
N-cumulants £ N (or the related factorial moments), but not the mean or bias of the estimator for 
Sn- 

However, there has been no attempt to explain quantitatively the differences in the measured 
Sn from different surveys of similar selection criteria in terms of the finite-volume effect. This is, 
perhaps, in part due to the lack of an analytical estimate of the systematic bias of the standard 
estimators for Sn- As we will show, a remarkably simple statistical fact allows just such a calculation 
to be done, while clarifying the origin of this bias, and relating it to other known biases in large 
scale structure statistics, such as the integral constraint. 

Consider the following elementary statement: 

<4> * <4> ( 2) 

B (B) V ' 

where ( ) denotes ensemble averaging, and A and B are two random variables or estimators. This 
statement holds generically, except in special cases such as when A and B are constants. 

The standard method of estimating Sn is to form estimates of £^ and £ 2 separately, and then 
take an appropriate ratio of the two. However, even in the ideal case where one has an unbiased 
estimator of the numerator (£jy; let us call this estimator A, i.e. (^4) = £ N ) and an unbiased 
estimator of the denominator (£ 2 ; let us call the estimator B, i.e. {B} = £ 2 ) 0> taking a ratio 
of the two estimators does not necessarily result in an unbiased estimator of Sn- This is captured 
by the statistical statement in eq. @. We will refer to this kind of estimation-bias as the ratio-bias. 

More generally, nonlinear combinations of unbiased estimators should be treated with great 
care. For instance, suppose £ 2 is an unbiased estimator of £ 2 such that (£ 2 ) = £2- It is virtu- 
ally guaranteed that ((£ 2 ) 2 ) 7^ (£ 2 ) 2 - We will refer to this kind of bias generally as a nonlinear- 
estimation-bias, of which the ratio-bias is a particularly simple and common example. In this paper, 
we will sometimes be abusing the terminology by using the two terms interchangeably. 

The well-known integral constraint, in the case of measurements of the two-point function, 
can in fact be understood as a ratio bias. The two-point function £2(2,:?), where i and j are two 
cells separated by some distance, is by definition (5i8j) where <5j is the overdensity at cell i. The 
catch is that one directly observes only rii, the number of particles/galaxies in a cell. An estimate 



4 We will show shortly that even this ideal case does not hold in reality. 
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of the mean number count n has to be made in order to assign a value 5i for each cell. This is 
generally taken to be J2i n i/^T where Nt is the total number of cells, let us call this estimator n. 
The problem is, of course, that ((n^ — n)(rij — n)/n ) ^ because of the estimator n in the 

denominator. As we will see, this estimation-bias is in general negative. This is what the integral 
constraint is about: that the measured two-point function is generally biased low because the mean 
number density is estimated from the same survey from which £2 is being measured. This turns 
out to be the ratio-bias in disguise. It is also easy to see that estimates of £jy would suffer from a 
similar bias. 

Peebles ( |1980| ) first pointed out the integral-constraint-bias for estimating £2, but his treatment 
only gave the large scale estimation-bias. Bernstein (|1994|) , building on earlier work by Landy & 
Szalay ( |1993| ), developed a perturbative approach (not perturbative in the usual sense of small 
density fluctuations, but perturbative in the small quantity: the average of the two-point function 
over the volume of the survey) to compute the full integral-constraint-bias for £2 , which accounted 



for the small-scale bias as well. (See also Kerscher 1998 for a related recent discussion.) Obviously. 



the integral-constraint bias affects also measurements of the one-point analogue, or the volume- 
average, of £2 i-e. the variance £ 2 . This integral-constraint-bias generally decreases in magnitude 
with increasing survey size, causing measurements of the correlation length from £2 or £ 2 to increase 
with sample depth, an effect that has been observed before ( Davis et al. 198S| ; Bouchet et al. 1993 ). 



Hence, in the case of the two-point function or its volume average, the finite-volume-effect pointed 
out by Colombi et al. Q1994Q is none other than the integral-constraint-bias. 



Adopting the techniques of Bernstein ( |1994| ) , we compute analytically the biases of the standard 
estimators for £ N and £V, for N = 2,3. The methodology for a general N is presented in §|2|. For 
simplicity, we illustrate how to keep track of the perturbative-ordering by going into details of the 
calculation for N = 2,3. These cases are also of special interest because many measurements of £ 2 > 
£3 and 53 exist in the literature. We go over the calculation of the estimation-biases for these three 
quantities in §||. For readers not interested in the details: much of the section can be skipped; the 
main results are in eq. (Ejj), (pS|) & (p9"D. 

We next check in §|] our analytical results using N-body simulations of the SCDM (Standard 
Cold-Dark-Matter) and LCDM (Lambda or Low-Density Cold-Dark-Matter) models. Ensemble 
averages of the standard estimators for £ 2 an d 5*3 are computed by using 10 realizations for each 
chosen model and for various sample- volumes. The overall agreement is excellent. We also introduce 
a way to correct for the analytical estimates when the estimation-bias becomes so large that the 
perturbative approach in §|3] breaks down. 

In §||, we present a first step towards a re-evaluation of existing measurements of the variance 
and skewness from the CfA/SSRS/APM surveys. We study simulated CfA/SSRS/APM cata- 
logues with the appropriate sizes, which include the effects of redshift distortions as well as sparse- 
sampling. We then consider a preliminary correction of the existing measurements of the variance 
and skewness from these surveys, based on our findings in §|3] and §||. The correction is neces- 
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sarily model-dependent (dependent on the power spectrum and the galaxy-biasing assumed), but 
it appears that most of the systematic differences between these surveys can be explained by the 
estimation-biases, under reasonable assumptions. A thorough analysis of the remaining differences 
would require a careful study of, among other things, projection effects and redshift distortions. 
We leave this for future work. 

Finally, we conclude in §|6| with a discussion of methods for measuring SV that might be subject 
to a less severe estimation-bias. We list other large scale structure statistics which might also suffer 
from this type of a nonlinear-estimation-bias. We also discuss applications of our findings outside 
conventional galaxy surveys, such as the Lyman-alpha forest, high-redshift Lyman-break galaxy 
surveys and weak-lensing maps. 



2. Biases of the Standard Estimators for £ N and Sn 
2.1. Definitions 

The standard estimator for Sn is given by: 

S N =t N /(t 2 ) N - 1 - (3) 

We use * to denote estimators of quantities we are interested in. 
£ 2 is the esimator for the variance: 

sl I . , a „ ^shot 

£ 2 = ^E(^) 2 -e 2 • (4) 

i 

Imagine that the survey is divided into many very small cells, so small that the number of parti- 
cles/galaxies in each cell is either 1 or 0. The index i above denotes such a cell, and Nt is the total 
number of such cells. <5j is an estimate of the local overdensity smoothed over some given radius R. 
We will assume top-hat smoothing in this paper. In other words: 



£^WH*,j), (5) 



where rij is equal to 1 if there is a galaxy and otherwise, n is an estimate of the mean density of 
the survey, and Wx(i,j) is the top-hat smoothing window. The estimator n is 

« = FEE^(U-). (6) 

We have not stated explicitly how to handle edge-effects. For instance, what should be done 
when the top-hat window Wt overlaps with the boundary in eq. (j5|)? Note that Ylj Wt{}i j) = 1 
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only for i sufficiently far away from edges. If one adopts the strategy that one picks only i's in eq. 
(|5|) such that the top-hat does not overlap with the boundary, then the corresponding Nt in eq. (|4|) 
should not be the total number of all infinitesimal cells in the sample, but a smaller number: the 
number of centers (at the infinitesimal cells) of top-hats whose windows do not overlap with the 
boundary. The j-index in eq. (|5|) should, on the other hand, range over the whole survey volume, 
up to the boundary. 

The standard shot-noise correction is given by 

^ishot ] 

e " = JT R m 

where Nr is the estimated mean number of particles within a smoothing top-hat of size R, in other 
words, it is uVr where Vr is the volume of the top-hat. 

Phrased in the above manner, the estimator 1S equivalent to the standard estimator for the 
variance using the counts-in-cells method, infinitely sampled (see (Szapudi 1998| ). 

The estimator for the N-th cumulant is defined similarly: 

— 1 * — ■> - jvr ^shot 

Zn = W Y,^)?-Zn (8) 

i 

^shot 

where the subscript c denotes the connected part of the sum, and £ N is a shot-noise correction 

^shot 

generalizing £ 2 • It is worthwhile at this point to introduce the continuum notation. We will 
replace J2i /Nt by / dVi/Vr, where dVi is the volume of the i-th cell and Vt is the total volume, 
and Wx(i,j) by the continuum top-hat W(i,j) normalized so that J dVj W(i,j) = 1 for i sufficiently 
far away from edges. 

It is also worthwhile to note that factorial moments are sometimes used to estimate £ N , which 



represents a convenient way to eliminate the shot-noise contribution (Szapudi & Szalay 1993), but 
otherwise results in the same estimator for £ N as in eq. (pi). 



2.2. Derivation of the Estimation- Biases: an Outline 

The integral-constraint-bias for £ 2 arises from the fact that the true mean density n is unknown, 
and has to estimated from the same sample from which one tries to measure the variance. To derive 
it, let us first write the estimator for the mean density as n = n(l + a), where n is given in eq. 
@, and a is a small fluctuation from the true mean. Then one can express the estimator for the 
overdensity <5j (eq. ||) as 

5 t = (5i-a)(l-a + a 2 -...), (9) 
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where Si is now the true overdensity (smoothed with the same top-hat as Si is) i.e. Si is n^/n — 1, 
appropriately smoothed, a is by definition equal to 

cx — -^j— f dVA . (10) 



V T 

Substituting the above into the expression for £ 2 (eq- @), one can expand in a and write down 
the ensemble average of £ 2 order by order. It can be easily shown that the zero-th order term (no 
a) of (£ 2 ) gives £ 2 , the true variance. The rest of the terms represent the integral-constraint-bias. 



A key question is by what order one should stop, which we will discuss in detail in §3.1. It suffices 
to note here that, strictly speaking, a by itself cannot be used as an ordering-parameter, because 
it is a random variable (depends on the data) which gets ensemble-averaged. Let us denote the 
integral-constraint-bias for £ 2 , or more generally, £ N by: 

g N )=t„(l + ^L). (11) 

As we will see, the fractional bias A| /£ N becomes small for a large enough survey. This is the 
limit in which we will be working. How large is large, or how small is small, is the question we 
would like to address. 

It is easy to see that the above methodology can be adopted for computing the ensemble 
average of the standard estimator for the hierarchical amplitude, (Sn)- The key idea is to assume 
the denominator part of the estimator fluctuates about its mean, and expand in that fluctuation. 
In other words, for Sn, let us first assume 

f 2 =? 2 (l + V 1 )(l + 6), (12) 

?2 

where e is a small fluctuation of the measured £ 2 from its mean (which is offset from the true £ 2 
by a bias; eq. pT|). Note that e depends on the data, and so cannot be taken out of (). 



Putting eq. (0) and (pD into eq. (||), one obtains the following expression for the mean of 

Sn- 

(Sn) = S N (l + ^) (13) 

ON 

^ = %-(N-l)^-(N-l)d^ + (N-l)^ { ^H + .... (14) 
Note that all terms up to e 2 are displayed above, except for the terms: — (A r — 1)(At /^2)[( e Civ) /£n] 



■2 



and (JV — l) : f-( / ^| 2 /? 2 )[{ e2 ?iv)/?Af]- A s we will show later on, the terms A| /£ 2) {c£n)/£n an d 

(€ 2 £n)/£n are an °^ the order of a small parameter in which we will be expanding: hence the 
dropping of products involving them. 
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It is instructive to divide the net bias of the estimator Sn into two different contributions. 
One is due to the integral-constraint-bias discussed above: namely that both £ N in the numerator 
and £ 2 i n the denominator (see eq. ||) are biased estimators. This gives the first two terms on the 
right hand side of eq. (|14|). The other contribution arises from the fact that the estimator Sjv is 
the ratio of two other estimators (to be accurate, it is in fact some nonlinear combinations of £/y 
and £ 2 ). This is the rest of the terms in eq. flU]). Let us give these two kinds of terms explicit 
names: 

^int.constr. 

= -(N-l) & (15) 

^ = ~(N - l)<f*> + (N - , (16) 

the integral-constraint-bias and the ratio-bias of respectively. We are slightly abusing the 
terminology here because the integral-constraint-bias is of course itself a form of a ratio-bias. As 
we will show below, it turns out that the terms contributing to the integral-constraint-bias partially 
cancel each other, leaving the ratio-bias to be the dominant contribution to the net bias of 5V on 
large scales. 

Note that we have implicitly assumed A-^ /£ N , (N — 1)A| /£ 2 and (N — l)(£/v e )/£/v an d 

(N — 1) N (e 2 £ N ) I '2£ N are all small, and that terms we have ignored in eq. (14) are somehow all 
of higher order. What is the correct ordering parameter in which we are expanding? Again, the 
quantity e cannot be used directly to keep track of the ordering, because it is data-dependent. 
For instance, after taking the expectation values, it could happen that terms that contain e 2 are 
actually comparable to terms linear in e. We will see that this is indeed the case in the next section. 

eq. (O) implies the bias of £V depends in general on the M-point correlation functions. The 
strategy we adopt in this paper is to assume the following hierarchical relation: £m ~ 4 2 / ~ 1 ! which 
is motivated by perturbation theory but has been observed to hold in the highly nonlinear regime 
as well. We do not need to assume anything, however, about the configuration or scale dependence 
(or independence) of the hierarchical amplitudes. Using this relation, it can be shown that (we will 
demonstrate this explicitly for S3) the terms we have kept in As n /Sn (eq. p4l) all contain terms 
linear in the following quantity: 

l L 2=^j dVidV^j) (17) 
where £ 2 is a smoothed version of the 2-point function defined as 

Uhl) = I dVM $ smth -(k,l)W(k,i)W(lJ) (18) 



where W is the top-hat smoothing window of some radius R, and ^ smth - is the unsmoothed 2-point 
function. £ 2 is the 2-point function averaged over the whole survey (of size L). For a survey to 
be of any use at all, this quantity has to be much smaller than 1. Because of the relatively large 
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coefficients that will be multiplying it in A$ n /Sn, as will be shown below, the fractional bias in 
Sn could be non-negligible, even for a relatively large survey volume. 

One should therefore view the derivation of the bias in Sn in this paper as an expansion in 
the small parameter £ 2 ■ As we will see, the same applies to £jy. Indeed, it can be shown that all 
the terms we have kept in eq. ( |T3| ) contain terms linear in £ 2 ; an d terms we have ignored are all of 
higher order ([£ 2 ] 2 , etc). We will demonstrate the reasoning with an example: N = 3. 



3. Estimation-Biases for the Variance, the Third Moment and the Skewness 

3.1. The Integral-Constraint-Bias for £ 2 

The integral-constraint-bias for the two-point function has been known for a long time (see 
e.g. Peebles 1980| ; [Bernstein 1994 ; Tegmark et al. 1998| ). Our treatment follows most closely that 
of Bernstein ( 1994| ), and the emphasis is on techniques that can be generalized to £ 3 and S3. 



Following the strategy outlined in § [2.2| , namely combining eq. (|9j), ( |l0|) with (0), we obtain 
1 



- 1 f n ^shot 

<£ 2 > = 7F dVM)-&2 ) (19) 



± J dVidV^) dVidVjiSfc) 

4 



dVidVjdVk{8f8j8 k ) + ^3 J dVidVjdVkiSiSjSk) + ... 
where we display all terms up to 0(a 2 ) (see eq. EI). 



As we have explained before, a cannot really be used as an ordering-parameter, because it is a 
random variable which gets averaged over. The key to the above expansion is instead to keep only 
terms up to linear order in £ 2 (eq. ||17[ ). 

Ignoring shot-noise for now, the first term on the right gives us the true variance £ 2 . This is 
the zero-th order term. The rest of the terms represent the integral-constraint-bias for £ 2 , and are 
all of order £ 2 or higher. Let us check. 



1. The first term on the second line of eq. (19) is none other than — £ 2 itself (eq. |l7|]). 



2. The next term, again ignoring shot-noise for now, gives (— 2/V^) / dVidVj&ty, Applying 
the hierarchical relation £m ~ i-e. £3(m>j) ~ £2(^5 i)£,2(h j) + we can see that the term 

£,2{i,i)£,2{hj) = €2^2 j) , when integrated over i and j, will give rise to a term proportional to £ 2 . 
Hence, the term (— 2/V^) / dVidVj^{i^i, j) contains a linear piece and should not be thrown away. 
Note that we need not make any assumption about the configuration or scale dependence of the 
hierarchical amplitudes. In other words, we should keep the term (— 2/V^) / dVidVj^3(i,i,j) as is, 
rather than as, say (— 2/V^)^2 / dVidVj^%(i,j). The hierarchical relation is used strictly for keeping 
track of the ordering at this stage. 



- 10 - 



3. The integrand in the next term, 3 (5f5j5k), can be broken up into several disconnected pieces: 
3(5f5j5k) = 3[^2 i)£,2(j, k) + 2£ 2 (i, k) + j, k)]. It is easy to see that the second piece 
gives something that is second order in £ 2 j when integrated over, and so does the £4(2, i,j, k) piece, 
assuming again the hierarchical relation. The only piece that survives is then, after integration, 

4. Lastly, the (4/V^) / dVidVjdVk(SiSj6 k ) term in eq. ( |i~9| ) is second order in £ 2 ■> again by 
applying the hierarchical relation. 

More generally, it can be seen that all terms, including those not explicitly displayed, in the 
expansion in eq. (|i~9|) are of the form Vf m J dVi 1 ...dVi m (fiJ 6i 2 ...5i m ), where 7 is 1 or 2. Our 
arguments above show that only terms with m = 1, or m = 2, or m = 3 and 7 = 2, contain pieces 
linear in £ 2 • The reader can convince himself or herself that all other terms are of higher order. 

Putting everything together, ignoring shot-noise, and using the definition of the fractional bias 

L 



in eq. (11), we obtain to linear order in £ 2 

= -=i-j / dVidV&ihj) -TT72 [ dVidViUhhj) (20) 

+ JLJdv i dv j &(hj)- 

The above result is consistent with that of Bernstein ( |1994| ) for the integral-constraint-bias of the 
two-point function. The first term on the right was obtained by Peebles (|l980|) via a different 
argument. 

How about shot-noise? A term like {SiSj) that shows up in eq. (19) includes both the cosmic 
2-point correlation £ 2 (i, j) (see eq. fl8| ) and a shot-noise contribution due to Poisson sampling (see 
e.g. [Feldman et al. 1994] ): 



(SiSj) = Uh3) + 4 / dV k W(k,i)W(k,j) (21) 
n J 

It can be shown that all shot-noise contributions to A| /^ 2 can be expanded in either 1/Nr or 
1/Nl where is the mean number of particles in a cell of size R, and A^^ is the mean number of 
particles in the whole survey. For instance, the term Vrf 1 J dVi(5f) has a Poisson term 1/Nr, if one 
makes use of eq. (^) , together with the fact that W is a top-hat of size R with the normalization 

^.shot 

/ dVjW(i,j) = 1. This gets canceled by part of the shot-noise correction term — (£ 2 ) (eq. Jl9[). 
On the other hand, a term like V>f 2 J dVidVj{5i5j) gives us a Poisson term of the order of 1/Nl. In 
general, 1/Nl is a very small quantity, and so we can ignore all terms of order 1/Nl- There are, 
for instance, Poisson pieces linear in £ 2 in the (5i5j5k) term in eq. (19), but they are also of order 
1/Nl, and so can be ignored. 



How about the 0(1/Nr) terms in eq. (|19|)? Without going into details, it can be shown 

— L ^shot _£ / 

that the only 0(1/Nr) contributions are: —^/Nr from the — (£ 2 ) term, —2^ 2 /Nr from the 
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fourth term on the right, and 3£ 2 /Nr from the fifth term on the right. Therefore, to 0(1/ Nr), 
the shot-noise terms miraculously cancel! It is interesting to note that this cancellation is possible 
only because the integral-constraint-bias in the shot-noise correction itself is taken into account i.e. 

^ishot 

) = (1/^) = (1 + (a 2 )...)/N R (see eq. & 0). 

We will see in §|] that our estimations of the biases in the variance and skewness which ignores 
shot-noise are in fact quite accurate. With no further justification, we will ignore shot-noise terms 
in the rest of our derivation, which substantially simplifies our expressions. The cancellation to 
0(1/ Nr) here for A| /£ 2 can be taken as suggestive evidence that shot-noise is unimportant for the 
estimation-biases we are interested in; we will rely on the numerical simulations in §|4] for further 
support. We should emphasize, however, we are not saying that there is no need to subtract out 
shot-noise when estimating the variance and skewness themselves. 

Finally, how about edge-effects? It is worth emphasizing that no assumptions about the edge- 
effects being small need to be made in deriving eq. (|20|). One only has to be careful about volume 



over which the integration is done and what Vr has to be. As we have mentioned earlier in §2.1 
one way to deal with the boundary is to use only cells that do not overlap with the edges. In that 
case, the dummies of integration i and j should range over the inner part of the survey, where any 
cell that is centered within it does not cut into the boundary. Similarly, Vt should be chosen to be 



the volume of that inner region. We will discuss how to approximate such integrals in § 3.4 



3.2. The Integral-Constraint-Bias for £ 3 

Similarly, one can derive the integral-constraint-bias for the third cumulant using eq. @, ( |TT1) 
and @ . Ignoring shot-noise, and again, keeping only terms to first order in £ 2 , the integral- 
constraint-bias for £ 3 is given by 

A "3 _ 3 f^ f _ t _ f£S ^ 9? 2 



-2 



( / dVidV&faiJ) - J dVdVMhj) (22) 

S3 Kz V T J t,3 V T J 



3 T S3 V T 

6 



Note that by essentially the same reasoning as in the case of £ 2 ) one only needs to consider terms 
up to a 2 in deriving the above. 



3.3. The Estimation-Bias S 3 



The integral-constraint-bias for S3 can be simply read off from eq. (|i~5|), ( |20| ) & (22). The 
ratio-bias for S3, on the other hand, follows from eq. (|l~6|). Substituting the definition of e from eq. 
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(|rj), we obtain 



Hh = — A^(e 3 K 2 -e 2 (i + ^)]) (23) 

63 ^(l + f^) ^ 2 

+ _ _ 2 3 A - M2 - ?a(l + ^)] 2 ) • 

Substituting eq. (jl|), (||) and (§|) into eq. © for N = 3, ignoring shot-noise terms and 
keeping only terms linear in £ 2 , it can be shown that 

A ratio r> 



,., 7 , dVidV^ jJ)--^- / dVidVj&iWJJ) (24) 
3 /" 

H — 7=0 / dVjdV k £ 4 (j,j, k, k) , 
where £tv is the connected N-point function. The first two terms here arise from the first term on 



the right hand side of eq. (22), and the last term from the second one. 



The reasoning used to arrive at the above expression is again very similar to the case of £ 2 . 
But there are a few new tips to keep in mind. 

1. Consider a term like (6f [Sj — £ 2 (1 + /£ 2 )])) which arises from the first term on the right 
of eq. (p3|). It can be seen that a disconnected piece with the j index all by itself, such as (5f)(5j), 
is going to be canceled, by — (Sf }£ 2 (1 + A| /£ 2 )- More generally, it can be seen that all terms in the 
expansion of eq. ( f23| ) contain an integrand of the form (5f <5f 1 <^ 2 ---<5j TO )j an y disconnected piece of 
which with the ji, or ... j m index all by itself is going to get canceled. In other words, the j-indices 
must be connected to each other, or to i. 

2. As before, integrals over products of the two-point function, of the form (5i5j)(5j5k) for 
instance, are of higher order. An example is from the second term on the right of eq. (|23|). It 
contains a term with {Sfdjd^.} in the integrand. This can be broken up into several disconnected 
pieces. Most of them got canceled because of the reason laid out in 1. above. One disconnected piece 
that does not get canceled is (Sf8j)(Sj5^). However, applying the hierarchical relation as before, 
we can see that this piece, when integrated over, is second order in £ 2 • Another disconnected piece 
is: {Sf){Sj5k) 2 ■ This deserves special attention. It gives rise to a term in the estimation-bias of 
the order of Vrf 2 J dVidVjl&ih j)] 2 : which is not guaranteed to be much smaller than £ 2 in general. 
However, we have checked numerically that for realistic power spectra, and L larger than about 
10h _1 Mpc, they are indeed small compared to £ 2 - To summarize, we can say that any products 
of the two-point function where the arguments are non-degenerate (i.e. £2^, j] where i 7^ j) can be 
ignored. 

3. Combining 1. and 2., it can be seen that all e 3 terms (or higher) can be ignored from eq. 
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(|23|). The same argument applies for arbitrary N, and so justifies the dropping of e 3 terms from 
eq. (0) or @. 



4. Note that eq. (p4|) can be derived by ignoring A| and A| altogether. Since, the lowest 

order terms in eq. (|23| ) are already of order ^! including A| and A| can only give higher order 
terms. 

Lastly, combining eq. (|2Q) , ( p2| ) and (p4|) and substituting into eq. (|l4|) and (16), we obtain 
the net bias of the estimator S3 : 

^ = dVidV&iiJJ) (25) 
9 3 



-2 



, , / dVidVj£ 5 {i,i,i,j,j) + =^- t - [ dVjdV k U(j,j,k,k) 

5 2?3 V T J £2^? ^ 



3.4. An Analytical Approximation 



The expressions in eq. fl20|), ( p2| ) & ( J25| ) give the exact fractional bias in £ 2 , £3 and ^3 to first 
order in £ 2 , excluding shot-noise. No assumption about the two-point function £2 itself being small 
has been made. The hierarchical relation £jy ~ C^ -1 nas only been used for book-keeping. We have 
not assumed anything about the configuration or scale dependence of the hierarchical amplitudes. 



In the present form, these expressions are not very useful as the N-point functions, up to 
N = 5, are required to compute the estimation-biases. We will approximate them as products of 
the two-point function (it is from this point on, that we use the hierarchical relation for more than 



simply book-keeping) using the following relation (see Bernardeau 1994): 



-rrn+m'-2 / . ,. .% 
W?2 Q2{l,J) 



(26) 

where {Sf l 5j >, ') c is the connected cosmic m + m'-point function (no Poisson terms), with only at 
most two differing indices. 



Putting eq. into eq. (]2 



A 



^2 



and (Eq) respectively, we obtain: 



?2 



2C1 2 ]?2 , 



(27) 



1 L 

3^12 |2_ , 
5*3 £ 2 



q Ci3 1 a 
~ 3 ^ + 6 . 



(28) 



A 



s 3 



2-9 



C12 

S 3 



+ 



4c 



12 



,C13 



2^ + 3c 22 

^3 



(29) 
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It is also instructive to distinguish between the two different contributions to As 3 /Ss, as in eq. 



lq ) &;(!£), one from the integral-constraint-biases of £ 2 anci £3 themselves: 



A int.cr. 
S 3 



,C12 
'ft 



+ 



4c 



12 



, C 13 



(30) 



~2 



and the other from the ratio-bias due to the division of £ 3 by £ 2 : 

,C23 



A ratio 



S3 



3c 2 2 



S3 



(31) 



In essence then, there are basically two kinds of terms in the estimation-biases, one that does 
not change with the smoothing scale R (the £ 2 term), and the other that increases in magnitude 



as R approaches the size of the survey L (the £ 2 /C2 term). We can write this in general as: 



~ET = ai =- + a 2 £2 
h ?2 



(32) 



where Ae/E denotes the fractional estimation-bias for the estimator E, and oi\ and «2 are coeffi- 
cients that depend on various hierarchical amplitudes, such as S3 and c mm i . 



The relation in eq. ( [26] ) is motivated by perturbation theory, and so, strictly speaking, only 
holds in the weakly nonlinear regime. But we have reasons to believe (from N-body work in 
preparation) that the same form should work on non-linear scales, albeit with the coefficients c mm i 
slightly altered from the perturbative (tree order) values (as it is known to happen with SV). In 
the same vein, we will use the tree order value for S3 in the above estimates of the fractional biases. 
This is admittedly crude for small scales, but we will see this is not a bad approximation in the 
next section. For £ 2 and £ 2 on the other hand, we have used both the linear and non-linear values 
and found little difference in the predicted biases. We use the linear values in all the figures of this 
paper. 



The perturbative values for the various hierarchical amplitudes are (Bernardeau 1994; ignoring 
galaxy-bias) : 

S 3 = 34/7 + 7 

Ci 2 = 68/21+7/3 

C13 = 11710/441 +6l7/7 + 2 7 2 /3 

C22 = c\ 2 

C23 = C12C13 



where 7 = 7(-R): 



7 



d\og R 



(33) 



(34) 
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is the logarithmic slope of the variance. For a power-law power spectrum with spectral index n we 
have: 7 = — (n + 3). 

Substituting the above into eq. ( p7| ) to (31), it can be seen that, for n of interests, a) the 



overall estimation-biases in £2, £3 and S3 are negative, i.e. a« < in eq. (|32]); b) the integral- 
constraint-bias and ratio-bias contributions to AS3/S3 are comparable on small smoothing scales 
and c) the ratio-bias contribution to As 3 /Ss dominates on large scales. All these are illustrated in 
Fig. [l] which shows the coefficients a\ (continuous line) and «2 (dashed line) as a function of 7, for 
different estimators E = £ 2 , S3 • 

Note that for a Gaussian model where Cjj = Sj = 0, the estimation-biases are quite different. 
The coefficient 0:2 becomes 3 for E = £ 2 so that the bias is positive on small scales. Also, a\ = 
and «2 = 6 for E = £3 while a\ = 2 and 02 = for E = S3, which means the bias is always 
positive. Hence the Gaussian prediction for the estimation-bias can be quite misleading (even for 
models with Gaussian initial conditions). 

Finally, a word on what value to assume for £ 2 ■ As we have noted before in §|3. 1|, care should 



be taken in dealing with the edge-effects. The expression for £ 2 is given in eq. (17) and (18). In 
obtaining £ 2 and £3, if one insists on using only cells that do not overlap with the boundary, then 
one has to restrict i and j in eq. (|17|) over an inner region of the survey where any cell centered 



within it does not cut the edges, and one should equate Vt with the volume of this inner region. 



On the other hand, the k and I indices of eq. (18) should still range over the whole survey volume 



We will not try to compute £ 2 exactly. Instead, we make use of the following observation: adhering 
to the convention for i, j, k and I above, we can rewrite £ 2 as J dVkdVifkfit;2 Smth '{k,l), where 
ff. = V^f 1 J dViW(k,i). The quantity fa varies between 1/Vr, for k sufficiently far away from the 
edges, to 0, for k sitting on the boundary. If one very crudely replaces fa by its volume-average, 
which is equal to l/Vf where is the total volume of the survey (everything within the boundary), 
it can be seen that £ 2 is then simply equal to £ 2 , except with a funny top-hat that covers the whole 
volume of the survey. We will further approximate this by estimating £ 2 using £ 2 with a spherical 
top-hat of size Rl such that its volume is the same as that of the survey (e.g. if the survey is 
a cubical box of side- length L, then L 3 = This is admittedly crude, but seems to be 

sufficiently accurate for the N-body experiments we study, at least for a smoothing scale R which 
is not too large compared to the size of the survey. In practice, one might want to go back to the 
original definition of £ 2 (eq. p7p, and compute £ 2 more carefully. 



4. Comparison with N-body Simulations 

To test our analytical predictions in the last section, we use simulations of two different spatially 
flat cold dark matter (CDM) dominated models. One set of simulations is of the SCDM model, with 
Slo = 1, h = 0.5, and another is of the LCDM model with Oq = 0.2, h = 1 and £l\ = 0.8. The power 
spectra, P(k) for these models are taken from Bond & Efstathiou ( |1984p and Efstathiou, Bond 
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and White ( |1992| ). The shape of P(k) is parametrized by the quantity T = Qh, so that we have 
r = 0.5 CDM and T = 0.2 CDM. Each simulation contains 10 6 particles in a box of comoving side- 
length 300 Mpc and was run using a P 3 M iV-body code (Hockney & Eastwood 1981; Efstathiou 



et al. 1985| ). All outputs are normalized to o% = 1. The simulations are described in more detail 



in Dalton et al. (1994) and Baugh et al.( |1995| ). In this paper we use 10 realizations of each model 



for computing ensemble averages, with error bars being estimated from their standard deviation. 

^From each realization we extract one subsample within a cubical box of size L = 300 /» -1 Mpc/M, 
where M is taken to be an integer, M = 1,2, 3, 7. So we have a set of 10 realizations of subsamples 
for each box-size, from L ~ 40 h~ l Mpc to 300 h^ 1 Mpc. We estimate moments of counts-in-cells (as 
in Baugh et al. 1995| ) in each set of subsamples to study how the estimation biases of the variance 
and skewness vary with survey volume. 

Note the importance of using subsamples of large simulations, rather than running simulations 
with an intrinsically small box-size. The latter introduces dynamical effects due to the missing 
of the large scale power, which we are not interested in for the purpose of this paper. Note also 
the importance of taking one subsample from each realization, rather than extracting multiple 
subsamples from a single realization, to ensure the statistical independence of the subsamples. 



4.1. The Variance £ 2 

The results for the variance of the LCDM model are shown in Figure ||. Because the LCDM has 
more power on large scales this model shows a more pronounced integral-constraint-bias compared 
to the SCDM model. Open circles show the measured variance averaged over the 10 realizations of 
the full box (L = 300 h^ 1 Mpc). Filled triangles show the mean measured variance for the smaller 
boxes with the box-size L = 300/2, 300/4, 300/5, 300/7 h^ 1 Mpc as indicated in each panel. In 
all cases, the error-bars represent 1 — a deviations in the measured variance over the relevant 10 
realizations. The solid line gives the linear perturbation theory prediction for £ 2 - (This is not 
perturbation theory in the sense of § |2.2| , but perturbation theory in the usual sense: an expansion 
in £ 2 or the density fluctuation amplitude; to avoid confusion, we will refer to it simply as PT.) The 
agreement of the solid line and the open circles on large scales indicate that the measured variance 
from the full box does not suffered from a significant bias, for the smoothing scales shown. The 
dashed line is the integral-constraint-bias prediction in eq. [27], which is in excellent agreement with 
the simulation results. 



4.2. The Skewness S 3 

The results for the skewness are shown in Fig. ^ & |] for the SCDM and LCDM models. Again, 
because the LCDM has more power on large scales, this model shows a larger estimation-bias. Open 

- -2 

circles show the mean (S 3 ) = (f 3 /f 2 ) in 10 realizations of the full box [L = 300 h' 1 Mpc). Filled 
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triangles show the mean (S3) over 10 realizations for each smaller box-size: L = 300/2, 300/4, 
300/5, 300/7 ft. -1 Mpc as indicated in each panel. The squares, on the other hand, show the mean 

(£3) divided by the mean (£2)) which is our way of isolating the integral-constraint-bias of S3 (eq. 



The solid line corresponds to the tree- level PT prediction for S3. Its agreement with the 
open circles on large scales indicates that the measured skewness from the full box does not suffer 
from any appreciable estimation-bias, for the smoothing scales shown. The short-dashed line is our 
analytical prediction for the integral-constraint-bias of S3 (eq. |K| i.e. no ratio-bias), and is in good 
agreement with the simulation results (squares). The long-dashed line is the net estimation-bias 
of S3 (eq. |29|| ), which includes both the ratio-bias (eq. and the integral-constraint-bias (eq. 

|30|| ), and should therefore be compared with the triangles from the simulations. It can be seen that 
the ratio-bias of S3 (eq. |3l]]) always dominates on large scales, and that the integral-constraint-bias 
of S3 is negligible except for the smallest subsamples. 

The agreement between our analytical prediction and the simulation results is good as long as 
the estimation-bias is not too large. Our analytical prediction breaks down when the bias becomes 
too large, as in the case of L = 300/7 h- 1 Mpc. This is hardly surprising as the calculation was 
done by explicitly assuming that the estimation-bias is small. We have found a phenomenological 
fit to correct for this: instead of having (S3) = Ss(l + A53/S3), we use 

(S 3 ) = S3exp(A 53 /S3) (35) 

where As 3 /S3 is our linear estimate (in £2) f° r the fractional bias as before (eq. |p9|| ). The 
higher order terms from the exponential helps partially cancel the over-prediction due to the linear 
term alone. This ansatz is shown as a dot-dashed line in the Fig. || & As can be seen, there 
is a reasonable agreement with the simulation results of both models, indicating that this is an 
acceptable extrapolation. An alternative would be to go beyond linear order in £ 2 , and compute 
the estimation-biases to second order. We will not attempt to do so here. 

It should be emphasized that we have used the perturbation theory values for quantities such as 
c mm' > S3 and £2 i n the analytical predictions for the various estimation-biases (eq. p7|, S & pi). 



The good agreement on small scales between our analytical predictions and the numerical results 
above should be seen as somewhat fortuitous. Lacking actual measurements from simulations of 
the quantities c mm i on nonlinear scales, we will not attempt to do any better here. In practice, 
one might want to use improved determinations of c mm >, etc at nonlinear scales, from simulations 



for example, in eq. (27) to (^y), or even go back to their original formulations in eq. (20), (E2| 



and (24). But there are a few reasons why our PT-based analytical predictions should work fairly 
well: 1. the terms involving £2/^2 are only important on large scales, and so using the PT values 
is adequate for these terms; 2. the true S3 and c mm > change only slowly with the smoothing scale 
(i.e. the tree-order PT predictions are not too far off); 3. most of the terms involve ratios of c mm i 
and S3, which are perhaps even slower functions of the smoothing scale. 
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5. Simulated Galaxy Catalogues and a First Step Towards Reconciliation 

To make contact with existing measurements from actual galaxy catalogues, we add three 
elements of realism to our simulations: 1. use box-sizes similar to those of surveys where some of 
the measurements of the variance and skewness have been made; 2. introduce redshift distortions 
for redshift catalogues and projection for angular-catalogues; 3. allow for a realistic level of (sparse) 
sampling. Galaxy-biasing is not implemented here, however. 

We simulate CfA/SSRS volume-limited catalogues similar to those studied in Gaztahaga 



(1992), based on the LCDM model. Redshift distortions are modeled in the usual way, with 
the distant observer approximation. We consider two sets of subsamples, taken from each of the 
10 full-box (L = 300 to -1 Mpc) realizations of the LCDM model as in Fig. ||. The first set has 
the same volume as the CfA/SSRS50 catalogues in Table 1 (a cubic box of L = 40 h~ 1 Mpc on a 
side, with a volume equivalent to that of a sphere of radius Rl = 25 h^ 1 Mpc; the effective depth 
is P ~ 50 h~ l Mpc) and the second set has a volume similar to the CfA92 catalogue in Table 2 
(a cubic box of L = 78 h~ l Mpc on a side, with a volume equivalent to that of a sphere of radius 
Rl = 48 /i -1 Mpc; the effective depth is T> ~ 92/i _1 Mpc). We have checked that the results in 
this section are essentially unchanged if, instead of a cubical box, one considers a conical geome- 
try which resembles that of the actual surveys. This is in part because of the actual solid angle 
substended by these surveys (~ 1.6; see caption of Table 1). 

Following Gaztahaga ( |1992| ), we concentrate on counts- in-cells for spherical cells of radius R in 
a range between 4—22 Mpc, depending on the subsample. The lower limit is chosen to avoid too 
much shot-noise, whereas the upper limit is picked to avoid large edge-effects. We vary the number 
of particles/galaxies in each catalogue to assess the effect of shot-noise on the estimation-biases. 

Simulated angular catalogues that resemble the APM are taken from Gaztahaga & Bernardeau 
(1998). The power spectrum is that measured from the APM. We consider two different box-sizes 



L = 378 h- 1 Mpc and L = 600 h~ l Mpc. 



5.1. The Variance £ 2 in Simulated CfA/SSRS Catalogues 

Figure || shows the integral-constraint-bias for £ 2 from the simulated CfA/SSRS catalogues. 
The point-symbols correspond to the values measured from the simulated catalogues, while the 
lines correspond to either values measured from the full-box LCDM simulations (long- and short- 
dashed lines) or analytical predictions for the integral-constraint-bias (solid lines). The variance 
measured in the nearby sample (CfA/SSRS50) is significantly smaller than that in the deeper 
sample (CfA/SSRS90), at the smoothing scale R = 9/i _1 Mpc. (Note that for clarity, we do not 
show the measurements from the deeper sample at smaller scales, but they follow those from the 
full-box LCDM simulation rather closely.) This is due to the systematic bias introduced by the 
integral constraint, which generally gets worse for a smaller survey volume (eq. j27[; as shown also 
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in Figure ^) . This results in the phenomenon that the amplitude of the measured variance seems to 
increase with the sample depth, as found in several studies (e.g. Davis et al. 1988] ; |Gaztahaga 1992 



Bouchet et al. 199"^ ). Bouchet et al. ( 1993 ) correctly attributed (a significant part off]) this observed 



phenomenon to a finite- volume-effect along the lines of Colombi et al. ( |1994 ). Seeing this as none 



other than the integral constraint allows us to predict the size of this bias analytically. 

Note how, on large scales (R > 9/i -1 Mpc), the variance in redshift space (long-dashed line) 
is larger than that in real space (short-dashed line), as predicted by Kaiser ( |198?1 ), whereas the 
reverse holds on small scales, because of shell crossing and virialization (the finger-of-God). The 
analytical predictions (eq. |20|]) for integral-constraint-bias of both catalogues are shown as solid 
lines. We show the predictions in real-space only, but it can be seen that the biased-estimates of 
the variance in real- and redshift-space are in fact quite similar - we will see this even more clearly 
for the skewness. As mentioned before, these predictions are only approximate for non-linear scales 
(R< 8/i _1 Mpc), because we have not modeled properly the non-linear values of c mm i and Sj. 
Nevertheless there is an overall agreement between the simulation results and the predictions. 

Lastly, we examine the effect of shot-noise on the estimation-bias, by sparsely sampling our 
catalogues (use 200 galaxies in each sub-sample instead of the ~ 10 4 in CfA/SSRS50 or ~ 10 5 in 
CfA/SSRS90). The effect can be seen to be small (compare closed triangles with open triangles). 



5.2. The Skewness S 3 in Simulated CfA/SSRS and APM catalogues 

Figure [] shows the results for S3 in the simulated CfA/SSRS catalogues. Note how £3 from N- 
body simulations is closer to the real space PT prediction (dotted line), when measured in redshift 
space (long-dashed line) than in real space (short-dashed line). 

As before, we clearly see the variation of the estimation-bias with survey volume. The bias 
is more significant for CfA/SSRS50 than for CfA/SSRS90. The analytical prediction here comes 
from the phenomenological ansatz we introduce in § |4.2| (eq. |35| ). Only the analytical prediction 
for real-space is shown (solid lines). Note how the biased-measurements of the skewness yield very 
similar values in real- and redshift-space, even though the true SVs (i.e. measured from the full box; 
long and short-dashed lines) are quite different in the two cases, especially on small scales. In the 
next section, we will take advantage of this fact and attempt to perform a preliminary correction of 
some existing (biased) measurements of £3 in redshift-space (which we assume to be very close to 
their real-space values), using our real-space analytical prediction for the estimation-bias. In other 
words, lacking analytical predictions for the redshift-space c mm ', we use the real-space PT values 
for the various hierarchical coefficients in eq. (^) to make corrections for measurements that are 
actually done in redshift-space. 



5 Luminosity segregation also plays a role here. 
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Figure [?] shows the measured 2D S3 in simulations of APM-like angular catalogues. The results 
are a reproduction from Gaztahaga & Bernardeau ( 1998| ). Two box-sizes are considered. The large 
box with L = 600 Mpc gives measurements which are in good agreement with the tree-level 
PT predictions on large scales, indicating that the estimation-bias is negligible in this case, at least 
for 9 < 10 deg., e.g. R < 70 hr l Mpc. Note that the actual APM survey has a size even bigger than 
this large box. The smaller box with L = 378 hr 1 Mpc results in a more appreciable estimation- 
bias,. Both of this results agree with our analytical prediction short-dashed line (long-dashed) line 
for the smaller (larger) box. The predictions can be obtained from eq.25 by just replacing the 3D 
hierarchical amplitudes by the 2D ones, as our derivation was totally general in this respect. We use 
the LCDM model PT predictions with the APM selection function and assume that cf^ ~ ri+jcf® , 
with where rj +J - ~ 1 as given in Gaztahaga ( |1994 ). These results are not very sensitive to the exact 
values of r i + j . 



5.3. A Preliminary Reconsideration of Some Existing Measurements from Actual 

Galaxy Surveys 

5.3.1. The Variance £ 2 * n the Actual IRAS, CfA and SSRS Surveys 



Any corrections of existing measurements of £2 based on eq.(27) are necessarily model-dependent, 



because values for quantities such as C12, £2 an d even £ 2 itself need to be assumed. All of them vary 
with the amount and nature of galaxy-biasing. Instead of conducting a detailed analysis covering 
many possible models, we ask the following simpler question: assuming that the LCDM or SCDM 
model for the shape of £ 2 are the true ones, and assuming the PT-theory (real-space) values for C12, 
what would be the corrected measurements of £ 2 f° r the IRAS, CfA and SSRS catalogues, using 
eq.(27), assuming no galaxy-biasing? 



Fig. H shows the correlation length Rq, defined as £ 2 (-Ro) = 1, as a function of Rl, the 
equivalent radius of the corresponding subsample. Open circles (squares) correspond to the values 
of the CfA (SSRS) volume limited subsamples at the bottom of Table 1. Filled squares correspond 
to the values in the IRAS 1.2 Jy volume limited subsamples by Bouchet etal (1993). The lines 
show the predictions for the measured Ro taking into account the integral-constraint-bias for £ 2 - 
We adopt the shape of the linear LCDM (continuous line) or the linear SCDM (dashed line) power 
spectrum to estimate all quantities, £ 2 > £2(^0 an d C12, in eq. (|27|) . This is only approximate as we 



do not take into account redshift distortions or non-linearities in the predictions, but note that in 
Fig. U we have found this approximation to be good. 

The amplitude of the linear power spectrum is chosen to give the best fit to the data points 
in Fig. ||. The best fit values for the LCDM model are R ~ 10.0 h- 1 Mpc (where R here 
is the Rq we infer from the best-fit amplitude when matching our integral-constraint-prediction 
with the data points) for the CfA/SSRS (top continuous line) which has a joint x 2 = 5.0/6, and 
Rq ~ 6.5 h~ l Mpc for IRAS (bottom continuous line) which has a x 2 = 37/20. The best fit values 
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for the SCDM model are R ~ 8.0 /i" 1 Mpc for the CfA/SSRS (top dashed line) which has a 
X 2 = 11.4/6, and R ~ 6.1 /i" 1 Mpc for IRAS (bottom dashed line) which has a \ 2 = 27/20. Note 
that the SCDM model gives a poorer fit to the CFA/SSRS data, while IRAS is compatible with 
both models. These values are to be compared with the mean values (which are usually taken to 
give the true amplitude): R ~ 8 /i" 1 Mpc for the CfA/SSRS and Rq ~ 5.5 h' 1 Mpc for IRAS. Note 
that our correction for Rq here ignores galaxy-biasing. Under such an assumption, the large value 
of Rq ~ 10 h~ l Mpc (ie as ~ 1.2 — 1.3) found in the CfA/SSRS seems difficult to reconcile with the 
amplitude infered from the angular APM Galaxy catalogue: as < 1.08 (see Gaztahaga 1995| ). To 
be strictly consistent, we should go back and allow for the effect of biasing on our correction for Rq: 
we would not attempt to do so here. Note also that the larger the volume limited subsample the 
brighter the absolute magnitudes of galaxies it contains; our correction for Rq implicitly ignores 
luminosity segregation i.e. that the intrinsic clustering does not change significantly as a function 
of the absolute magnitude of the galaxies. A direct measurement in redshift space in the Stromlo- 
APM Catalogue gives as = 1.1 ± 0.1 for the brightest sample with M\ )j < —20 (sample d. in Table 
3 in Loveday et al.( 1996p ). This Stromlo-APM subsample contains galaxies with similar absolute 
magnitudes to those of the CfA92 sample, where Mb < —20.3, given that b ~ bj - 0.3 (|Dalton &] 
Gaztanaga 1998|) . Thus, our analysis indicates a small relative galaxy-bias between the CfA/SSRS 
and the APM galaxies that seems not attributable entirely to luminosity segregation. 



5.3.2. The Skewness in the Actual CfA, SSRS and APM Catalogues 
Again here, it is clear that any corrections of existing measurements of 53 based on eq. ( B9h 

— — L — 

are necessarily model-dependent, because values for quantities such as c mm ', £, 2 , £2 an d even S3 
itself need to be assumed. All of them vary with the amount and nature of galaxy-biasing. Instead 
of conducting a detailed analysis covering many possible models, we ask the following simpler 
question: assuming different shapes for the power spectrum all normalized to as = 1, and assuming 
the PT-theory (real-space) values for c mm i and S3, what would be the corrected measurements of 
S3 for the CfA and SSRS catalogues, using eq. (|29|), or more appropriately, its extension in eq. 
(p5D, if no galaxy-bias is assumed? 

Note that here, we are taking advantage of the finding in § |5.2| (Fig. ^) that the biased- 
measurements of S3 yield very similar values in real- and redshift-space, and so we can simply apply 
the correction formulated in rea/-space to the measurements in redshift-space. On large scales, this 
is a safe assumption because even the true S^s are very similar in real- and redshift-space. On 
small scales, this assumption remains to be further scrutinized. Obviously, this assumption must 
break down when the survey-size is large enough (c.f. short- and long-dashed lines in Fig. ||). 

We will concentrate on the last six measurements in Table |l[ These published values of S3 
( Gaztanaga 1992j ) are the mean values in the corresponding range of smoothing scales shown in the 
4th column. Here we will associate each with the mean value of R in the respective range of scales 
i.e. R = 5,8 & 15 h~ l Mpc with S 3 = 1.8, 1.7 & 1.7 from the CfA50, CfA80 & CfA92 catalogues 
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on the one hand, and S 3 = 1.4, 1.9 & 2.2 from the SSRS50, SSRS80 & SSRS115 catalogues on the 
other. The corrected values of S3, adopting the assumptions stated above, are shown as lines in 
Fig. H The correction is larger for the smaller scales which correspond to sub-samples of a smaller 
size. The SCDM predictions are much lower as it has less power on large scales. 

If there is no biasing, we can assume that the APM- values for £ 2 is close to the true ones, which 
is is reasonable given the size of the APM (see e.g. Fig. |7|). In this case we should concentrate 
on the continuous line in Fig. |9[ At the larger smoothing scales, namely 8 Sz 15 h^ 1 Mpc, the 
corrected values are quite consistent with the APM-values. However, at the smallest smoothing 
scale of 5/i _1 Mpc, the corrected S3S are significantly higher than the APM-values. Three points 
should be noted here. First, at this smoothing scale, the correction is so large that the validity of 
the ansatz expressed in eq. (^) might be called into question. Second, the implicit assumption that 
the biased- measurements of £3 in real- and redshift-space yield similar values should be checked 
using more simulations. Third, it is in fact well known that the small-scale S3 (of the mass) one 
would infer from an N-body simulation with an APM-like power spectrum and Gaussian initial 
conditions is larger than the measured S3 from the APM survey ( Baugh &: Gaztahaga 1996| ). One 



possible interpretation is a scale-dependent galaxy-bias, which tends to diminish on large scales but 
becomes significant on small scales. 

On large scales R ^8 h^ 1 Mpc, we can safely say that most of the discrepancies in existing mea- 
surements of S3 from the CfA/SSRS/APM catalogues can be explained by an estimation-bias. Re- 
maining differences are attributable to a) a small relative galaxy-bias between the different surveys 
on large scales; b) redshift-distortions; c) deprojection effects (see Gaztanaga fc Bernardeau 1998 ) 



and d) sampling fluctuations. In fact, our analysis as shown in Fig. |8| does support the existence 
of a galaxy-bias between the CfA/SSRS and the APM. To be strictly consistent, we should have 
taken this into account in our "correction" of the CfA/SSRS values for S3. Doing so is beyond 
the scope of the present paper. Nonetheless, it should be emphasized that the amount of galaxy- 
bias one would infer based on measurements of S3 is reduced if the estimation-bias is taken into 
account. A careful assessment would require the inclusion of all the above effects, and ideally, a 
re-measurement of S3 from the different catalogues using methods that are perhaps less prone to 
the ratio-bias. We hope to pursue these in a future paper. 



6. Discussion 



The main results of this paper are summarized in eq. (|20|), (22) & (|25|), with the associated 
useful approximations given in eq. (|27|), (p8|) & (|29|). Together they tell us the estimation-biases 
associated with the standard estimators for the variance £ 2 > the third cumulant £ 3 , and the skewness 
S3 (eq. |ll| & Jl3|]). The calculation is based on an expansion in the small parameter £ 2 ( ec l- 10)' 
which is the variance smoothed on the scale of the survey, but otherwise does not assume or require 
the smallness of the variance on the scale of interest R. 
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^From eq. (p7[), pq ) & (^3), it can be seen that the standard estimators are all asymptotically 
unbiased, in the sense that for a given smoothing scale R, the estimation-biases tend to zero as 
the survey-size increases. On the other hand, for a fixed survey-size, the estimation-biases become 
large as R approaches the size of the survey. 

There are two types of terms in the fractional estimation-biases, one dependent on the smooth- 
ing scale R, being proportional to £2/^2 where £ 2 is the variance smoothed on scale R, and the 
other not, being proportional to £ 2 only. In other words, a general form for the estimation bias of 



an estimator E can be represented by eq. (32) where ct\ and 02 are coefficients that depend on the 
various hierarchical amplitudes, such as S3 and c mm >. For reasonable choices of the parameters S3 
and c mm t , the estimation-biases are negative (see Fig. fil) . The magnitude of the estimation-biases 

— L 

can be surprisingly large, especially for E = S3, because of the large coefficients multiplying £ 2 or 

tit* 

Our analytical predictions are borne out by numerical experiments discussed in §[|. Examples 
can be found in Fig. ||, [3| & ||. In cases where the estimation-biases are so large that the higher order 
terms in our expansion in £ 2 become important, we show that a simple ansatz works reasonably 



well (eq. @). 

In the case of S3, we distinguish between two types of contributions to its estimation-bias. The 

- 2 _~ 
standard estimator for S3 is S3 = £ 3 /£ 2 • Part of the bias arises from the biases of the estimators £ 3 

and £2 themselves - this is the integral-constraint-bias (eq. (3(|). The second part arises from the 
particular nonlinear combination of these two estimators, which we dub to be the ratio-bias (eq. 
|3l|| ), or more precisely, the nonlinear-estimation-bias. It turns out the second always dominates 
on large scales. 

We present a preliminary attempt to correct some existing measurements of the variance and 
skewness in §||. Our main conclusions are a) the apparent increase of the measured variance with 
survey depth observed by some authors can be nicely explained by the integral-constraint bias 
(Fig. ^; see e.g. Davis et al. 1988; Gaztahaga 1992| ; [Bouchet et al. 1993 ); b) our analysis indicates 



a small relative galaxy-bias between the CfA/SSRS and the APM galaxies, c) the APM survey 
should give small estimation-biases for the standard estimator for S3 on scales of interest (see 
Table |l]); d) on large scales i?^8 /i -1 Mpc, most of the differences between the measured skewness 
from the CfA/SSRS and the APM surveys can be attributed to the estimation-bias; however, a 
more careful analysis, taking into account redshift-space distortions, relative galaxy-bias (such as 
due to luminosity segregation) and deprojection effects, is necessary to access the significance of 
the remaining differences, and whether they are due to pure sampling fluctuations. 

Two areas clearly warrant further investigations. First, for the purpose of future surveys 
such as the SDSS or the AAT 2dF, the £ 2 " m dependent terms in the estimation biases (the terms 
associated with ai in eq. will probably be unimportant, but the £ 2 -dependent terms (the 
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ai-terms) can always become important for a sufficiently large smoothing scale R. f\ It would 
therefore be good to have an idea of what that scale is, not just for the variance and the skewness, 
but also for Sn where N > 3. We will present results for a general N in a separate paper. 

Second, we have focused in this paper exclusively on the standard estimators for £ N and Sn, 
where £ N is estimated by the standard counts-in-cells technique and Sn is estimated by taking 
the appropriate ratio of £ N and £ 2 - There are probably other estimators that suffer from smaller 
estimation-biases. For instance, Kim & Strauss (1998) recently introduced an interesting method 



to obtain S3 by fitting the one-point probability distribution of counts (PDF) using an Edgeworth 
series. They obtained values higher than previous measurements from the same surveys (see Table 
H) indicating that their method is less susceptible to an estimation-bias. An extension of their 
method using a PDF which is better behaved in the presence of more significant nonlinearities is 
worth pursuing. 

Another possibility which might reduce the ratio-bias in Sn due to the division of £ N by £ 2 : 
instead of dividing to obtain an estimate of Sn, fit a curve parametrized in some form to the 
two-dimensional plot of £ N (R) and ^(R) at each smoothing scale R. This is akin to, for instance, 
how the Hubble constant is usually measured: instead of dividing some estimate of velocity by 
some estimate of distance for each data point followed by averaging, a linear x 2 fit to all points is 
performed. This method might give a smaller ratio-bias, but it has to be tested. This procedure 
has in fact been carried out before by e.g. Gaztahaga ( |1992| ) and Bouchet et al. ( 1993 ), who 



obtained values of S3 close to that using the standard estimator. But a different parametrization 
of the relation between £ N (R) and ^(R) ( m other words, taking into account carefully the change 
of Sn with R) might yield different results. 

Perhaps the most important and obvious lesson of our investigation here is that nonlinear 
combinations of estimators should be used with caution. The only sure-fire way of avoiding an 
estimation-bias is through Monte-Carlo simulations, such as those performed in §||. There is noth- 
ing novel about this point, except that its importance has not been sufficiently emphasized in 
measurements of certain large scale structure statistics, such as £jy and Sn discussed here. 

There are also strong interests in measuring such quantities outside galaxy surveys, e.g. for 
the transmission distribution of quasar spectra and for the convergence distribution in weak-lensing 
maps. The skewness of the former, for instance, provides a test of the gravitational-instability 
picture of the Lyman-a forest (|Hui 1998| ), while the skewness of the latter is a sensitive probe of 
cosmology ( |Bernardeau et al. 1997| ) . Measurements in these areas require similar caution as in the 
case of galaxy surveys. Our methodology in deriving the estimation-bias for the standard estimator 
of skewness (ratio of the third cumulant to the second cumulant squared) could be adapted for such 
measurements (in the case of weak lensing, we learned as this work was being completed that a 



6 However, for most current models of the power spectrum, as R increases 7 becomes more negative and the 
coefficient 0:2 in S3 becomes smaller (see Fig. hi). 
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related calculation was done by Bernardeau et al. 1997). An important ingredient of our calculation 



is the use of the hierarchical relation £jv ~ £2 1 in keeping track of the ordering, which might have 
to be modified for these other applications. 

Also, for these applications, as well as for less conventional galaxy surveys such as the Lyman- 
break galaxy surveys at high rcdshifts ( [Stcidcl ct al. 1998 ), one often has available several indepen- 



dent fields for which a simple and obvious method should help to reduce much of the ratio-bias of 
the skewness: measure the third moment and the second moment for each field, separately average 
each of them over all fields, and only then does one combine these averaged moments to estimate 
the skewness. 

A related suspect of a similar estimation-bias is the measurement of Qn, defined as the ratio of 
the N-point correlation to the sum of suitable permutations of products of the two-point functions 



( Fry 1984 ). The configuration dependence of Q3, for instance, provides an elegant test of the galaxy- 



bias ( Fry 199*4] ). Common ways of estimating it, where estimators are divided by each other, are 



susceptible to the ratio-bias just as in the case of Sjv- Another possibly problematic statistic is the 
ratio of the quadrupole to monopole power in redshift-space, or the ratio of the monopole power 
in redshift-space to that in real-space, which is often used to estimate the parameter f3 = 0°' 6 /6 
(see review by [Hamilton 1997 ). An examination of published estimates of (5 show a large scatter 



even from the same surveys, with maximum likelihood methods yielding consistently higher values 



( Hamilton 1997 ), suggesting an estimation-bias of some sort might be lurking here. It is likely, 
however, that such attempts to measure (3 are at least equally, if not more strongly, affected by 
our poor understanding of translinear distortions. We hope to pursue some of the above issues in 
future work. 
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LICK 
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Groth & Peebles 1977 


7 




48 ±7 
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LICK 


210 


Fry k Peebles 1978 
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Szapudi et al. 1992 
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Szapudi et al. 1996 
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Gaztahaga 1994 
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Szapudi et.al 1995 
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4-20 
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90 


Meiksin et al. 1992 


6 


2.4 ± 1.1 


11 ± 13 


2-20 


IRAS 1.9Jy 


35-60 


Fry & Gaztahaga 1994 


7 


2.1 ± 0.6 


7.7 ± 5.2 


3-10 


IRAS 1.9Jy 2 


35-40 
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1.5 ± 0.5 


4.4 ± 3.7 
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Peebles 1980 (eq.[57.9]) 
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6.3 ± 1.6 
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Fry k Gaztahaga 1994 
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U 


7 


1.8 ±0.2 


5.4 ± 2.2 


1-12 


SSRS 


25-50 


u 


8 


1.8 ±0.2 


5.2 ± 1.3 


2-11 


SSRS 2 


25-50 


u 


5.6 ±0.4 


1.8 ±0.3 


3.3 ± 1.5 


2-8 


CfA50 2 


25 


Gaztanaga 1992 


11.3 ±0.8 


1.7 ±0.2 


2.5 ± 1.1 


4-12 


CfA80 2 


40 


u 


9.8 ±2.2 


1.7 ±0.5 


3.0 ±4.3 


8-22 


CfA92 2 


50 


u 


6.7 ±0.5 


1.4 ±0.5 


1.7 ±2.3 


2-8 


SSRS50 2 


25 




9.8 ±2.0 


1.9 ±0.4 


4.4 ± 2.0 


4-12 


SSRS80 2 


40 


u 


11.2 ±2.5 


2.2 ±0.9 


6.6 ±8.5 


8-22 


SSRS115 2 


60 


u 



Table 1: Some measurements of the variance and Sn, for N = 3,4, in the literature. The first 
column i?o is the scale at which the measured variance equals 1. The second and third columns give 
S3 and S4, from either counts- in-cells or the multi-point ratios Qn, at the scales specified in the 
fourth column. In most cases, only the mean values for Sn over a range of scales were published. 
In cases where measurements of the individual Sjy for each smoothing scale are reported in the 
literature, we quote the actual range of estimates over the corresponding range of scales. An 
estimate of the effective radius of each sample is given by Rl = (fi/47r) 1 / 3 P, where f2 is the solid 
angle of the survey and V is taken to be either the maximum depth in volume limited samples 
or twice the mean depth in magnitude limited samples. The effective volume for IRAS has been 
divided by two because there are two disconnected polar caps, and measurements were done by 
averaging the results from the two pieces. Samples with the z-superscript are in redshift-space, and 
those without are in angular-space. 
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7= dlog £ 2 /dlog R 

Fig. 1. — Coefficients ot\ (continuous line) and a-i (dashed line) for the fractional estimation-bias 
i! estimator E: -# = a±^- + 02^2 ( ec l- 13211), where £ 2 is the variance smoothed on scale R and 

L ?2 ' — 1 

£2 is the variance smoothed on the scale of the survey. The top panel shows the coefficients for 
the variance E = £ 2 (eq. |27j ) , and the bottom panel shows the coefficients for the skewness E = S3 
(eq. |p9[ ), while the middle panel shows the integral-constraint-bias contribution to E = S3 (eq. 
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Fig. 2. — The integral-constraint-bias of the variance-estimator £ 2 m the LCDM model as a 
function of the smoothing scale R. Open circle: the average measured variance, (£2)1 computed 
from 10 realizations of the full box (L = 300 /i -1 Mpc). Filled triangle: the average measured 
variance computed using 10 subsamples, each extracted from each realization of the full box {L = 
300 hr 1 Mpc/M, for M = 3, 4, 5 & 7 as labeled for each panel). The 1 — a error-bars are computed 
from the dispersion of the measured variance around the mean over the 10 respective realizations. 
Short-dashed line: analytical prediction for the integral-constraint-bias (eq. [^]). Solid line: tree- 
level PT prediction for £ 2 - 
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Fig. 3. — The estimation-bias for S3 in the SCDM model as a function of the smoothing scale R. 
Open circle: the mean of S3 over 10 realizations of the full box (L = 300 h^ 1 Mpc). Filled triangle: 
the mean of S3 over the corresponding subsamples of each realization of the full box (subsample 
box-sizes are L = 300 /i -1 Mpc/M, for M = 2,4, 5 & 7 as labeled for each panel). Filled square: 
the mean of £3 divided by the mean of £f over the corresponding 10 subsamples (i.e. the integral- 
constraint-bias only). The error-bars are computed from the standard deviation of the measured 
values over the respective 10 realizations; for clarity, error-bars for the squares are only shown in 
one panel. Short-dashed line: the analytical prediction for the integral-constraint-bias of S3 (eq. 
p0|). Long-dashed line: the analytical prediction for the net estimation-bias of S3 (eq. ]29|]). Solid 
line: the tree-level PT prediction for 5*3. Dot-dashed line: a phenomenological ansatz for the net 
estimation-bias when it becomes large (eq. p5[). 
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Fig. 4. — The estimation-bias for S3 in the LCDM model, labeled in exactly the same way as in 
Fig. I 
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Fig. 5. — Measurements of £ 2 from simulated CfA/SSRS catalogues based on earlier LCDM 
simulations with T = 0.2 (see Fig. ^). Short- and long-dashed lines shows the LCDM simulation 
results in real and redshift space respectively for the full box (L = 300 h~ 1 Mpc). The two sets of 
point-symbols (one on the left, one on the right) depict the measurements from two different volume- 
limited simulated catalogues: a) CfA/SSRS50 (for R < Mpc), limited to V ~ 50 h^ 1 Mpc b) 
CfA/SSRS90 (for R > 9/t -1 Mpc), limited to V ~ 90 h' 1 Mpc. Square: real space, full sampling 
(~ 10 4 galaxies for CfA/SSRS50 & ~ 10 5 galaxies for CfA/SSRS90). Open triangle: redshift space, 
full sampling. Closed triangle with error-bars: redshift space, sparse sampling (200 galaxies). The 
solid line shows our analytical predictions of the integral-constraint-biases in real-space (eq. [p7| ) 
for the respective catalogues. Note that 1. the larger simulated catalogue yields a larger measured 
variance at 9/i _1 Mpc; 2. the sparse-sampling does not significantly affect the mean determination 
of the variance; 3. the real-space and redshift-space 6wsed-estimations of the variance give very 
similar values. 
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Fig. 6. — Measurements of S3 from simulated CfA/SSRS catalogues based on earlier CDM simu- 
lations with r = 0.5 (top panel) and T = 0.2 (bottom panel) (see Fig. |||). Dotted line shows the 
tree-level PT prediction for each model (e.g. S3 = 34/7 + 7). Short- and long-dashed lines show 
the measured S3 from simulations of the full-box (L = 300 h^ 1 Mpc) in real and redshift space 
respectively. The point-symbols show measurements from two sets of volume-limited simulated 
catalogues: a) CfA/SSRS50 (left), limited to V ~ 50 h' 1 Mpc b) CfA/SSRS90 (right), limited to 
V ~ 90 h^ 1 Mpc. Square: real space, full sampling (~ 10 4 galaxies for CfA/SSRS50 & ~ 10 5 galax- 
ies for CfA/SSRS90). Open triangle: redshift space, full sampling. Closed triangle: redshift space, 
200 galaxies. The solid line shows our prediction of the estimation-bias for S3, using the ansatz 



introduced in §4.2, in real space. Note how the friased-estimations (from the simulated catalogues) 
of the skewness give very similar values in real- and redshift-space, even though the true SVs (i.e. 
measured from the full box) are quite different in the two cases, especially at small smoothing 
scales. 
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Fig. 7. — Measurements of the projected S3 (denoted by S3 here) from simulated APM-like angular 
catalogues. Closed circle: the mean of S3 over 20 angular catalogues from L = 600 /i" 1 Mpc simu- 
lations. Open circle: the mean of S3 over 20 angular catalogues from L = 378 h^ 1 Mpc simulations. 
Solid line: the tree-level perturbation theory values of the projected S3. The agreement of the 
solid line with the closed circles on large scales indicate that measurements of S3 from the large 
box suffer from negligible estimation-biases. Short-dashed line: the analytical prediction for the 
estimation-bias of S3, for the smaller simulation. Long-dashed line: the analytical prediction for 
the estimation-bias of S3, for the larger simulation. 
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Fig. 8. — The correlation length, defined as £2(^0) = 1, as a function of Rl the equivalent radius of 
the volume limited subsample. Open circles (squares) correspond to the values of the CfA (SSRS) 
subsamples at the bottom of Table 1. Filled squares correspond to the values in the IRAS 1.2 
Jy by Bouchet et al.(1993). The continuous (dashed) line shows the prediction for the measured 
Ro, which takes into account the integral-constraint-bias, for the LCDM (SCDM) model, whose 
power-spectrum-amplitude amplitude is adjusted to fit the data points. 
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Fig. 9. — Opened circles show the uncorrected values of S3 in the SSRS (top) and CfA (bottom) 
subsamples at the end of Table 1. These should be compared to the deprojected APM values (filled 
squares). The lines show the values corrected for the estimation-bias assuming different power- 
spectrum to estimate the variances £^ and £,2(R) i n ec l- (11)- The LCDM model prediction is 
shown as a short-dashed line, the linear SCM model as a long-dashed line and the linear APM-like 
model as a continuous line. All cases are normalized to erg = 1. The corrections are larger for 
smaller scales where the measured values are from smaller sub-samples. The SCDM predictions 
are much lower as it has less power on large scales. 



